Video image monitoring system

ABSTRACT

This is a video image monitoring system which can effectively detect a mobile object appearing in a captured video image even if a background image and other camera condition change continuously. The video image monitoring system comprises: a video-image-capturing section  100  for putting out image data based on a video image signal obtained by using a camera  10 ; a mobile-object-candidate-area-detecting section  101  for extracting a candidate area of a mobile object from the image data; and a mobile-object-detecting section  102  for determining whether the candidate area is the mobile object. The mobile-object-candidate-area-detecting section  101  quantizes a brightness gradient direction of the image data, and calculates a spatio-temporal histogram which represents the frequency of a direction code appearing in a predetermined spatio-temporal space. After that, the mobile-object-candidate-area-detecting section  101  calculates a statistical spatio-temporal space evaluation value of the spatio-temporal histogram. The mobile-object-detecting section  102  uses the spatio-temporal space richness to determine whether the candidate area is the mobile object.

CROSS REFERENCE TO RELATED APPLICATION

The present patent application claims the benefit under 35 U.S.C. 119 ofJapanese Patent Application No. 2009-048529 filed on Mar. 2, 2009, thedisclosure of which is incorporated into this patent application byreference.

TECHNICAL FIELD

The present invention relates to a video image monitoring system havingfunctions of: recording image data converted from a video image obtainedby using an image-capturing device such as a camera used with a videorecording device, a monitoring device, or a mobile robot; detecting atrespasser by using an image recognition method; and detecting a personwho approaches the mobile robot. In particular, the present inventionrelates to a video image monitoring system superior in detecting amobile object if an image-capturing device is movable.

BACKGROUND ART

A video image monitoring system has functions of conducting an imageprocessing on a video image obtained by an image-capturing device suchas a camera etc.; and detecting a mobile object such as a person or avehicle appearing in a monitored area. The video image monitoring systemof this kind has various functions of recording only a video image inwhich a mobile object appears therein; displaying a warning icon on adisplay device; and giving an alarm to a security personnel by soundingan alarm etc. The video image monitoring system of this kind also lowersthe workload of security personnel in a monitoring operation, whichformerly required continuous observation. In addition, the video imagerecorded by using the video image monitoring system of this kind can beused for investigating a criminal act such as theft, or an illegitimateact.

Recently, more and more image monitoring systems are introduced in massmerchandisers, financial institutions, buildings, and offices, sinceawareness of crime prevention increases in society because of crime rategrowth, expansion of crime patterns, and lowering criminal arrest rate.More and more cameras are installed at various locations since thestorage capacity of a video recording device increases and since networkcameras are widely used. There is a growing demand for a surveillanceassist function since it is too burdensome for a security personnel tomonitor a recorded video image continuously to find out a criminal actetc. as previously explained.

In addition, the video image monitoring system has expanded its scope ofapplication into combined use with a pan, tilt, zoom (PTZ) camera,having a zoom lens unit and mounted on a camera platform capable ofbeing rotated and tilted, for tracking a trespasser, or with a cameramounted on a mobile robot etc. for the purpose of visual recognition.However, these applications have a problem since not only an object tobe monitored but also a background moves in the field view of a cameraif the camera is moved. In a known image recognition processing methodwhich is ordinarily employed for a fixed camera for detecting a mobileobject, a reference background image is produced at first, and then, adifference between the reference background image and an image inputanew is calculated.

A conventionally known image-processing device (see Patent Document 1)detects changes among images produced from a video image captured by thePTZ camera, and then, estimates a camera framing based on the result ofthe detection. The previously produced images are transformed based onthe estimated camera framing, and then, images of a mobile object isextracted by using the transformed images and the video image capturedby the camera. The image-processing device conducts an image recognitionprocess to the extracted images of the mobile object.

In addition, an image-processing operation must be stable against avarying brightness or noise when the image-capturing condition varies,e.g. in an outdoor. A known method disclosed in Non-Patent Document 1 isrobust against the varying brightness since brightness gradients in theimages are encoded to direction code data in this method.

PRIOR ART DOCUMENT Patent Document

-   [Patent Document 1] Japanese Patent Laid-open Publication No.    2002-344960

Non-Patent Document

-   [Non-Patent Document 1] ULLAH Farhan et al. “Orientation Code    Matching for Robust Object Search (Special Issue on Image    Recognition and Understanding)”, IEICE transactions on information    and systems, Vol. E84-D, No. 8(20010801), pp. 999-1006, Aug. 1,    2001, The Institute of Electronics, Information and Communication    Engineers

DISCLOSURE OF THE INVENTION Problems to be Solved by the Invention

To summarize, the image-processing device disclosed in Patent Document 1reconstructs the reference background image by using the video imagecaptured by the PTZ camera and information regarding the panningmovement of the PTZ camera. However, the image-processing devicedisclosed in Patent Document 1 has a problem that the workload isextremely large for reconstructing the background images by transformingthe previously obtained images. In addition, the image-processingoperation can be conducted with the image-processing device disclosed inPatent Document 1 only if the information of rotation and tilt of thepanning movement of the PTZ camera is available. A possible costincrease is another problem in the video camera system.

In addition, it is difficult to use the method disclosed in theNon-Patent Document 1 in conditions where an image capture conditiondynamically changes, since this method is applicable only to a pre-fixedcamera for the purpose of image identification with a predeterminedtemplate or inspection purpose.

An object of the present invention is to provide a video imagemonitoring system which can detect a mobile object appearing in acaptured video image by controlling a background image and a cameracondition which change continuously.

Means for Solving Problem

In order to solve the aforementioned problem, a video image monitoringsystem according to the present invention includes: amobile-object-detecting section for detecting a mobile object from avideo image signal obtained by using an image-capturing device such as acamera etc.; a recording section for recording information of the mobileobject being detected by the mobile-object-detecting section and a videoimage captured by using a video-image-capturing section in a recordingmedium; an output section for outputting the result of the informationof the mobile object detected by the mobile-object-detecting section; adirection code calculation section for calculating a direction codeobtained by a mobile-object-candidate-area-detecting section whichquantizes a brightness gradient direction of an image inputtedthereinto, the mobile-object-candidate-area-detecting section beingconnected upstream of the mobile-object-detecting section; aspatio-temporal histogram calculation section for calculating aspatio-temporal histogram which represents a frequency of a plurality ofimages and of the direction code calculated in a predetermined space;and a spatio-temporal space evaluation criteria calculation unit forcalculating a statistic spatio-temporal space evaluation criteria of thespatio-temporal histogram. The video image monitoring system accordingto the present invention can determine whether themobile-object-candidate is the mobile object from the spatio-temporalspace evaluation criteria.

Effect of the Invention

The present invention can provide a video image monitoring system whichcan detect a mobile object appearing in a captured video image bycontrolling a background image and other camera condition which changecontinuously and by adopting a spatio-temporal space richness forcalculating the change in the direction code in the spatio-temporalspace.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram showing a video image monitoring system of oneembodiment of the present invention.

FIG. 2 is a block diagram showing the detail of amobile-object-candidate-area-detecting section.

FIG. 3 is a flowchart of processes conducted by a spatio-temporal spacerichness calculation unit.

FIGS. 4A to 4D show a method of direction encodement. FIG. 4A shows anoriginal image. FIG. 4B shows an image which was filtered by an edgeenhancement filter. FIG. 4C shows a brightness gradient direction of theimage shown in FIG. 4B. FIG. 4D shows a direction code allocated in thebrightness gradient direction shown in FIG. 4C.

FIG. 5A shows a concept of a spatio-temporal space. FIG. 5B shows anexample of a spatio-temporal histogram in the spatio-temporal spaceshown in FIG. 5A.

FIGS. 6A and 6B show an example of a spatio-temporal histogram accordingto the present embodiment. FIG. 6A shows a spatio-temporal histogram ofan area in which a mobile object appears. FIG. 6B shows aspatio-temporal histogram of a background area.

FIG. 7 is a block diagram showing the detail of amobile-object-detecting section.

FIG. 8 shows an example of an outputted image according to the presentembodiment.

EMBODIMENTS FOR CARRYING OUT THE INVENTION

In the following, an embodiment of the present invention will beexplained in detail with reference to the accompanying drawings.

FIG. 1 is a block diagram showing a video image monitoring system 1according to one embodiment of the present invention.

The video image monitoring system 1 includes a camera 10, an outputsection 20, a recording medium 30, and an image recognition device 40.More specifically, in this video image monitoring system 1, anelectronic calculator system is applied to a system including the camera10, the recording medium 30, and the output section 20. The hardware ofthe electronic calculator system includes CPUs, memories, andinput/output interfaces etc., into which predetermined software isinstalled to provide functions described in block diagrams shown in theaccompanying drawings.

The camera 10 is an image-capturing device which includes a camera lenshaving a zoom function; and an image-capturing element such as acomplementary metal oxide semiconductor (CMOS) or a charge coupleddevice (CCD), neither of which are shown in the drawings. The camera 10captures a video image and puts out the video image to avideo-image-capturing section 100, which will be explained later, of theimage recognition device 40. This camera 10 is a PTZ camera mounted on acamera platform which can be panned, tilted, and shifted.

The recording medium 30 may be an electronic recording medium such as ahard disk drive unit and a flash memory etc. The recording medium 30 maybe another data recording medium such as a magnetic tape storage deviceetc.

Parameter information and image information obtained by the imagerecognition device 40 are added to the video image captured by thecamera 10, and then, the image and the information are stored in therecording medium 30.

The output section 20 is a display device such as a liquid crystaldisplay device or a cathode ray tube (CRT) display device etc. Insteadof using the output section 20, the video image monitoring system 1 mayhave another data output configuration such as red-green-blue (RGB)monitoring or data networking etc. Parameters are set by using a userinterface. The user interface useable for the output section 20 includesan input device such as a mouse or a keyboard (not shown in thedrawings), into which a user can put in various parameters.

The image recognition device 40 will be explained next in detail.

The image recognition device 40 includes the video-image-capturingsection 100, a mobile-object-candidate-area-detecting section 101, amobile-object-detecting section 102, and a recording section 103. Thevideo-image-capturing section 100 captures the video image transmittedfrom the camera 10. The mobile-object-candidate-area-detecting section101 calculates a candidate area of the mobile object. Themobile-object-detecting section 102 determines whether amobile-object-candidate captured by themobile-object-candidate-area-detecting section 101 is a mobile object.Time information etc. is added to the video image captured by thevideo-image-capturing section 100, and the information and the image arestored in the recording section 103.

The video-image-capturing section 100 converts the video image capturedby and transferred from the camera 10 into image data suitable for imagerecognition process or video recording, and puts out the image data fromthere. The image data are converted into a one-dimensional array formator a two-dimensional array format. In addition, in order to reduce noiseeffects and flickering effects, the video-image-capturing section 100may conduct a pretreatment to the image data such as a smoothingfiltering, an edge enhancement filtering, or a concentration conversionetc. The format of the image data is selectable from RGB color format ormonochrome format. Alternatively, in order to reduce cost for dataprocessing, the image data may be resized to a predetermined size.

The mobile-object-candidate-area-detecting section 101 conducts apredetermined image-processing to the image data transferred from thevideo-image-capturing section 100. In this image-processing, themobile-object-candidate-area-detecting section 101 extracts a candidatearea of a mobile object appearing in the video image.

FIG. 2 is a block diagram showing the detail of themobile-object-candidate-area-detecting section 101.

The mobile-object-candidate-area-detecting section 101 includes: aspatio-temporal space richness calculation unit 200; a frame subtractionunit 201; a reference-still-image-producing unit 202 for producing areference image 203; and a mobile-object-candidate-area-detecting unit204.

The mobile-object-candidate-area-detecting section 101 conductsimage-processing of (1) to (3) follows.

(1) The spatio-temporal space richness calculation unit 200 calculates aspatio-temporal space richness from the image data.

(2) The frame subtraction unit 201 calculates a frame difference betweenthe reference image 203 and an image input thereinto where the referenceimage 203 is calculated from the image data by thereference-still-image-producing unit 202 or may be calculated previouslyunder a predetermined condition.

(3) The mobile-object-candidate-area-detecting unit 204 calculates amobile-object-candidate-area by using the spatio-temporal space richnessand the frame difference.

The spatio-temporal space richness calculation unit 200 has a functionof calculating the spatio-temporal space richness. The spatio-temporalspace richness is an example of “spatio-temporal evaluation criteria”recited in claims. The spatio-temporal space richness is obtained bycalculating the quantity (entropy) of information in which a directionis encoded.

FIGS. 4A to 4D show a method of direction encodement. FIG. 4A shows anoriginal image. FIG. 4B shows an image which is filtered by an edgeenhancement filter. FIG. 4C shows a brightness gradient direction of theimage shown in FIG. 4B. FIG. 4D shows a direction code allocated in thebrightness gradient direction shown in FIG. 4C.

As shown in FIGS. 4A to 4D, in the process of direction encodement, abrightness gradient of an image is calculated at first, and after that,the brightness gradient is quantized in a predetermined direction andthen encoded.

FIG. 3 is a flowchart showing processes conducted by the spatio-temporalspace richness calculation unit 200.

At first, edge gradients ΔIu, ΔIv of each pixel p (x, y) of an inputimage I_(xy) in the horizontal direction and in the vertical directionare calculated (step S1).

The edge gradients ΔIu, ΔIv are calculated by using an edge enhancementfilter. If a Sobel filter is used as an edge enhancement filter, acalculation coefficient FLT_(h) in the horizontal direction and acalculation coefficient FLT_(v) in the vertical direction arerepresented by Equations (1) as follows. Other edge enhancement filterssuch as a Prewitt filter may be used instead of the Sobel filter.

$\begin{matrix}\left\lbrack {{Equation}\mspace{14mu} 1} \right\rbrack & \; \\{{{FLT}_{h} = \begin{bmatrix}1 & 0 & {- 1} \\2 & 0 & {- 2} \\1 & 0 & {- 1}\end{bmatrix}},{{FLT}_{v} = \begin{bmatrix}1 & 2 & 1 \\0 & 0 & 0 \\{- 1} & {- 2} & {- 1}\end{bmatrix}}} & (1)\end{matrix}$

In the next step S2, an edge intensity ρ_(xy) is calculated according toEquation (2) and the edge gradients ΔIu, ΔIv which were calculated byaccording to Equations (1) showing the filtering method.[Equation 2]ρ_(xy)=√{square root over (ΔI _(u) ² +ΔI _(v) ²)}  (2)

In the next step S3, the spatio-temporal space richness calculation unit200 determines whether an equation of ρ_(xy)>Γ_(ρ) applies where Γρ is apredetermined threshold.

If the equation of ρ_(xy)>Γ_(ρ) applies (Yes in step S3), that is, ifedge intensity ρ_(xy) is greater than the predetermined Γρ, thespatio-temporal space richness calculation unit 200 calculates an edgedirection θ_(xy) in step S4 and proceeds to the next step S5.

If the equation of ρ_(xy)>Γ_(ρ) does not apply (No in step S3), that is,if edge intensity edge intensity ρ_(xy) is not greater than Γρ, thespatio-temporal space richness calculation unit 200 does not calculatean edge direction θ_(xy) and proceeds the process to the next step S5.

If the calculated edge intensity ρ_(xy) is low, it can be caused by somepixels being influenced enormously by noise etc. Therefore, in the stepS4, the predetermined threshold Γρ is used not to give a direction codeto such pixels. If the spatio-temporal space richness calculation unit200 determines that the edge intensity ρ_(xy) exceeds the predeterminedthreshold Γρ in some pixels p (x, y), the spatio-temporal space richnesscalculation unit 200 calculates edge directions θ_(xy) of such pixels p(x, y) according to Equation (3).[Equation 3]θ_(xy)=tan⁻¹(ΔI _(v) /ΔI _(u))  (3)

In the next step S5, the spatio-temporal space richness calculation unit200 calculates a direction code C_(xy) according to Equation (4) and thecalculated edge direction θ_(xy). If the spatio-temporal space richnesscalculation unit 200 performs the step S4, that is, if the equation ofρ_(xy)>Γ_(ρ) applies, the direction codes C_(xy) is obtained bycalculating a formula θ_(xy)/Δθ. If the spatio-temporal space richnesscalculation unit 200 does not perform the step S4, that is, if theequation of ρ_(xy)>Γ_(ρ) does not apply, the direction codes C_(xy)becomes equal to N=2π/Δ_(θ).

It should be noted that, in the equation N=2π/Δ_(θ), N is a quantizationnumber obtained by dividing the sum of gradient directions 2π withΔ_(θ). The quantization number N also indicates the number of directioncodes C_(xy) each of which is allocated to each divided gradientdirection. For example, if the quantization number N is 16 as shown inFIG. 4D, each direction code C_(xy) is one of 0 (zero) through 15. Ifthe direction code C_(xy) of the pixel p (x, y) is determined to haveρ_(xy) not greater than Γρ in the step S3, the quantization number N isthe greatest integer. For example, if the quantization number is 16 aspreviously explained, C_(xy) is 16. In this way, if the edge intensityρ_(xy) is not greater than the predetermined threshold Γρ, theineffective direction code, e.g. 16 in this embodiment, is given to sucha pixel. This concludes the process of direction encodement for imagedata.

As explained above, the spatio-temporal space richness calculation unit200 functions as a “direction code calculation unit” recited in claims.

$\begin{matrix}\left\lbrack {{Equation}\mspace{14mu} 4} \right\rbrack & \; \\{C_{xy} = \left\{ \begin{matrix}{\left\lbrack \frac{\theta_{xy}}{\Delta_{\theta}} \right\rbrack\text{:}} & {{{if}\mspace{14mu}\rho_{xy}} > \Gamma_{\rho}} \\{N = {\frac{2\pi}{\Delta_{\theta}}\text{:}}} & {otherwise}\end{matrix} \right.} & (4)\end{matrix}$

FIG. 5A is a schematic explanation of a spatio-temporal space. FIG. 5Bshows an example of a spatio-temporal histogram P_(xyt) of thespatio-temporal space shown in FIG. 5A.

As shown in FIG. 5A, there are M pieces of images, each having an x-yplane, from time T-M to time T in the direction of time t in thisspatio-temporal space. Each image has a plane area having a dimensionL×L. In this way, a rectangle spatio-temporal space S, not shown in thedrawings, is defined by the x-y plane having the dimension L×L and bythe temporal length M.

FIG. 5B shows the spatio-temporal histogram P_(xyt) of the space S bycalculating the following Equation 5. In this spatio-temporal histogramP_(xyt), the vertical axis represents the frequency at which thedirection code C_(xy) appears, and the horizontal axis represents thenumber of direction codes C_(xy). In FIGS. 5A and 5B, “i” represents thenumber of direction codes C_(xy).

In the next step S6, the spatio-temporal space richness calculation unit200 produces the spatio-temporal histogram P_(xyt) showing the frequencyat which the direction code C_(xy) appears by using the calculateddirection code C_(xy) and a group of direction code C_(xyt)εS which hasbeen calculated with respect to the space S defined by L×L in the x-yplane and the temporal length M. In order to produce the spatio-temporalhistogram P_(xyt), the spatio-temporal space richness calculation unit200 at first calculates h_(xyt), which represents the frequency at whichthe direction code C_(xy) appears, according to Equation (5) where δ isKronecker delta.

$\begin{matrix}\left\lbrack {{Equation}\mspace{14mu} 5} \right\rbrack & \; \\{{h_{xyt}(i)} = {\sum\limits_{{({x,y,t})} \in S}{\delta\left( {i - C_{xyt}} \right)}}} & (5)\end{matrix}$

Furthermore, the spatio-temporal space richness calculation unit 200calculates the spatio-temporal histogram P_(xyt), which is representedas relative frequency, by considering: the frequency h_(xyt)(N) at whicha pixel, having the edge intensity ρ_(xy) lower than the predeterminedthreshold Γ_(ρ) and having the ineffective direction code, appears; andthe space S, and according to Equation (6).

As explained above, the spatio-temporal space richness calculation unit200 functions as a “spatio-temporal histogram calculation unit” recitedin claims.

$\begin{matrix}\left\lbrack {{Equation}\mspace{14mu} 6} \right\rbrack & \; \\{{P_{xyt}(i)} = \frac{h_{xyt}(i)}{{L^{2} \times M} - {h_{xyt}(N)}}} & (6)\end{matrix}$

In the next step S7, the spatio-temporal space richness calculation unit200 calculates a spatio-temporal space richness R_(xyt). In this step, aspatio-temporal entropy E_(xyt) is an evaluation criteria for thespatio-temporal histogram. The spatio-temporal space richnesscalculation unit 200 determines the maximum entropy E_(max), and thencalculates the spatio-temporal space richness R_(xyt). The maximumentropy E_(max) is calculated according to Equation (7). Thespatio-temporal entropy E_(xyt) and the spatio-temporal space richnessR_(xyt) are calculated according to Equations (8). The symbol α_(e)shown in Equations 8 represents a weighting coefficient on a thresholdvalue, which is set appropriately in accordance with the feature of animage.

As explained above, the spatio-temporal space richness calculation unit200 functions as a “spatio-temporal evaluation criteria calculationunit” recited in claims.

$\begin{matrix}\left\lbrack {{Equation}\mspace{14mu} 7} \right\rbrack & \; \\{E_{{ma}\; x} = {- {\sum\limits_{i = 0}^{N - 1}{\frac{1}{N}\log_{2}\frac{1}{N}}}}} & (7) \\\left\lbrack {{Equation}\mspace{14mu} 8} \right\rbrack & \; \\{{E_{xyt} = {- {\sum\limits_{i = 0}^{N - 1}{{P_{xyt}(i)}\log_{2}{P_{xyt}(i)}}}}}{R_{xyt}\; = \left\{ \begin{matrix}\frac{E_{xyt} - {\alpha_{e}E_{\max}}}{E_{\max} - {\alpha_{c}E_{\max}}} & {{{if}\mspace{14mu} E_{xyt}} \geq {\alpha_{e}E_{\max}}} \\0 & {otherwise}\end{matrix} \right.}} & (8)\end{matrix}$

If a mobile object appears in front of a background, and if the camera10 is moved or undergoes rolling, FIG. 6B, i.e. the histogram of thespatio-temporal space richness R_(xyt) shows that pixels p (x, y)located in the space S included in a background area tend to havespecific direction codes. That is, in this state, the value of entropybecomes smaller.

In contrast, the spatio-temporal space richness R_(xyt) becomes higherif the direction codes are detected in various directions as shown inFIG. 6A, e.g. if a person etc. moves in a video image. That is, in thisstate, the value of entropy becomes greater.

The spatio-temporal space richness calculation unit 200 makes use ofthis principle to separate the mobile object from the background area.That is, in step S8, the spatio-temporal space richness calculation unit200 determines whether an equation R_(xyt)>Γ_(R) applies. In the presentembodiment, ΓR is a threshold predetermined for the spatio-temporalspace richness R_(xyt).

If the spatio-temporal space richness calculation unit 200 determinesthat the equation R_(xyt)>Γ_(R) applies (Yes in step S8), i.e. if thespatio-temporal space richness calculation unit 200 detects a mobileobject, the spatio-temporal space richness calculation unit 200determines the position of a mobile-object-candidate in the next stepS9, and then proceeds to the next step S10. More specifically, in thestep S9, the spatio-temporal space richness calculation unit 200determines that, if a pixel p (x, y) exceeds the predetermined thresholdΓρ, such pixel is included in the spatio-temporal space richness R_(xyt)and constitutes a mobile object candidate position Obj_(xy).

If R_(xyt)>Γ_(R) does not apply (No in step S8), the spatio-temporalspace richness calculation unit 200 proceeds the process to the nextstep S10.

In the step S10, the spatio-temporal space richness calculation unit 200determines whether the process in the steps S8 and S9 were completed forall the images.

If the spatio-temporal space richness calculation unit 200 determinesthat the process in the steps S8 and S9 were completed for all theimages (Yes in step S10), the spatio-temporal space richness calculationunit 200 finishes the process shown in FIG. 3.

If the spatio-temporal space richness calculation unit 200 determinesthat the process of the steps S8 and S9 were not completed for all theimages (No in step S10), the spatio-temporal space richness calculationunit 200 repeats the process for non-processed pixels.

In this manner, the spatio-temporal space richness calculation unit 200conducts these processes S1 to S10 on all the pixels of the imagesexisting in the space S.

Returning to FIG. 2, a process conducted by the frame subtraction unit201 will be explained. In the present embodiment, a widely known methodi.e. a frame difference method is used for calculating a candidate areaof the mobile object. For example, in the frame difference method, adifference is obtained between image data in one frame and image data ina preceding frame at first, and then, an inter-frame change of theimages occurring in the short period is detected. After that, thecandidate area of the mobile object is calculated. In this method, theaccuracy in calculating the candidate area of the mobile object can beimproved. Other methods such as histogram matching and image-processingtechnique such as optical flow etc. may be used for calculating thecandidate area of the mobile object.

At first, prior to processing with the frame subtraction unit 201, thereference-still-image-producing unit 202 produces a reference image 203from the preceding image data. (Otherwise, the reference image 203should be prepared in advance.) In order to reduce workload, thepreceding image is used as the reference image 203 in the presentembodiment. The frame difference method is also advantageous fordeleting an obviously recognizable background area since this methoddetects the mobile object candidate area and noise. A frame differenceSub_(xy) is obtained by using an equation as follows:Sub_(xy) =|B _(xy) −I _(xy)|(if Sub_(xy)>Γ_(sub))

where B_(xy) represents the reference image 203; and where SuB_(xy)represents a binarized image obtained by conducting a process using athreshold Γ_(sub).

A process conducted by the mobile-object-candidate-area-detecting unit204 will be explained next. At first, an Obj_(xy) is calculated by usingthe spatio-temporal space richness calculation unit 200 and the framesubtraction unit 201. As shown in Equation (9), a mobile objectcandidate area D_(xy) is a logical conjunction (AND) of the area havinga higher spatio-temporal space richness and the area having a greaterframe difference value. As a result, the mobile object candidate areaD_(xy) is equal to spatio-temporal space richness R_(xyt) of thecandidate area of the mobile object.[Equation 9]D_(xy)=Obj_(xy)^Sub_(xy)  (9)

FIG. 7 is a block diagram showing the detail of themobile-object-detecting section 102.

The mobile-object-detecting section 102 includes acandidate-area-reshaping unit 700, a labeling unit 701, and a mobileobject area calculating unit 702.

As previously explained, the mobile object candidate area D_(xy) hasalready been calculated. At first, the candidate-area-reshaping unit 700conducts a process of reshaping the mobile object candidate area D_(xy).If the mobile-object-detecting section 102 determines that the candidateof the mobile object is a human, the mobile-object-detecting section 102conducts a filtering to the entire image data by using, for example, aGaussian filter G having an elongate window of G=(10×20). It is thuspreferable to conduct the reshaping process in consideration of the size(area, dimension etc.) or the shape of the mobile object which is to bedetected; and information regarding the space in which a video image iscaptured (e.g. information regarding the field of view of the camera10). Equation 10 shows a reshaping process, i.e. a process of binarizingthe candidate area.

$\begin{matrix}\left\lbrack {{Equation}\mspace{14mu} 10} \right\rbrack & \; \\{{Bin}_{xy} = \left\{ \begin{matrix}1 & {{{if}\text{:}\mspace{14mu} G*D_{xy}} > {thr}} \\0 & {otherwise}\end{matrix} \right.} & (10)\end{matrix}$

After that, the labeling unit 701 conducts a labeling process on theBin_(xy) of each mobile object, and then calculates a rectangle regionand the area of the labeled mobile object.

The mobile object area calculating unit 702 has thresholds predeterminedfor an area and an aspect ratio etc. set based on information regardingthe camera install condition and on information regarding an object tobe detected. The mobile object area calculating unit 702 maintains anarea which is to be detected and removes noise.

The mobile object area calculating unit 702 puts out the calculated areaof the mobile object to the output section 20 and the recording section103. The output section 20 can put out a rectangle area etc. of themobile object. The recording section 103 can be controlled to record avideo image only when detecting a trespasser.

FIG. 8 shows examples of images put out according to the presentembodiment.

The upper left window is an output image showing a rectangle areasurrounding a person. The upper right window shows an image ofvisualized direction codes. The lower left window shows an imageobtained by conducting the Gaussian filter G to the spatio-temporalspace richness R_(xyt). This output represents the candidate area of themobile object constituting the pixels having the spatio-temporal spacerichness R_(xyt) exceeding the predetermined threshold. The lower rightwindow is a binarized image obtained by conducting a process of labelingthe candidate area of the mobile object, in which a portion (C)indicates the result of the labeling process. Although a video imageinputted thereinto is trembling a little in this image, the area of aperson, i.e. mobile object is successfully detected.

INDUSTRIAL APPLICABILITY

The present invention can provide a video image monitoring system whichcan effectively detect a mobile object appearing in a captured videoimage by controlling a background image and other camera condition whichchange continuously and by adopting a spatio-temporal space richness forcalculating the change in the direction code in the spatio-temporalspace.

EXPLANATION OF REFERENCE

-   1: video image monitoring system-   10: camera (image-capturing device)-   20: output section-   30: recording medium-   40: image recognition device-   100: video-image-capturing section-   101: mobile-object-candidate-area-detecting section-   102: mobile-object-detecting section-   103: recording section-   200: spatio-temporal space richness calculation unit-   201: frame subtraction unit-   202: reference-still-image-producing unit-   203: reference image-   204: mobile-object-candidate-area-detecting unit-   700: candidate-area-reshaping unit-   701: labeling unit-   702: mobile object area calculating unit

1. A video image monitoring system comprising: a video-image-capturingsection configured to put out image data based on a video image signalobtained by using an image-capturing device; amobile-object-candidate-area-detecting section configured to extract acandidate area of a mobile object from the image data; and amobile-object-detecting section configured to determine whether thecandidate area is the mobile object, wherein themobile-object-candidate-area-detecting section comprises: a directioncode calculation unit configured to calculate a direction code obtainedby quantizing a brightness gradient direction of the image data; aspatio-temporal histogram calculation unit configured to calculate aspatio-temporal histogram which represents a frequency of the directioncode appearing in a predetermined spatio-temporal space; and aspatio-temporal space evaluation value calculation unit configured tocalculate a statistical spatio-temporal space evaluation value of thespatio-temporal histogram, and wherein the mobile-object-detectingsection is configured to determine whether the candidate area is themobile object based on the spatio-temporal space evaluation value. 2.The video image monitoring system according to claim 1, wherein thespatio-temporal space evaluation value calculation unit is configured tocalculate an entropy of the spatio-temporal histogram as thespatio-temporal space evaluation value.
 3. The video image monitoringsystem according to claim 1, wherein themobile-object-candidate-area-detecting section has a frame differenceunit configured to calculate a frame difference from a reference image,and wherein the mobile-object-candidate-area-detecting section isconfigured to detect the mobile object by evaluating the result of thecandidate area together with the frame difference.
 4. The video imagemonitoring system according to claim 1, wherein themobile-object-detecting section has a candidate-area-reshaping unitconfigured to reshape the candidate area by considering the size of themobile object to be detected or information regarding the space in whicha video image is captured.
 5. The video image monitoring systemaccording to claim 3, wherein the mobile-object-candidate-area-detectingsection has a reference-still-image-producing unit configured to producethe reference image based on the image data supplied from thevideo-image-capturing section.